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1.  SELECTION  IN  RECOGNITION 


A  key  problem  in  object  recognition  is  selection,  namely,  the  problem  of  identi¬ 
fying  regions  in  an  image  within  which  to  start  the  recognition  process,  ideally  by 
isolating  regions  in  an  image  that  are  likely  to  come  from  a  single  object.  Model- 
based  object  recognition  methods  that  try  to  recognize  which  members  of  their 
library  of  models  are  present  in  the  scene,  usually  use  geometric  features  such  as 
points  or  edges  and  try  to  identify  pairings  between  data  and  model  features  that 
are  consistent  with  a  rigid  transformation  of  the  object  model  into  image  coordi¬ 
nates.  The  large  number  of  such  pairings  that  need  to  be  examined  in  cluttered 
scenes  leads  to  a  combinatorially  explosive  search  problem.  It  has  been  shown  that 
this  search  can  be  considerably  reduced  if  recognition  systems  are  equipped  with 
a  selection  stage  where  subsets  of  data  features  can  be  isolated  that  are  likely  to 
come  from  a  single  object,  thus  allowing  the  search  to  be  focused  on  those  matches 
that  are  more  likely  to  lead  to  a  correct  solution  [12].  This  isolation  can  be  ei¬ 
ther  based  solely  on  image  data  (data-driven)  or  can  incorporate  the  knowledge 
of  the  model  (task-driven  or  model-driven).  In  addition,  it  is  desirable  to  order 
these  subsets  of  data  features  such  that  the  more  promising  ones,  i.e.,  those  that 
are  more  likely  to  point  to  a  single  object,  are  explored  first.  This  can  not  only 
increase  the  likelihood  of  a  good  match  being  obtained  earlier,  but  is  also  useful 
when  the  task  is  to  recognize  as  many  objects  as  possible  in  a  scene.  Thus  the 
goals  of  selection  in  recognition  are  two-fold:  To  isolate  areas  in  the  image  that 
are  likely  to  come  from  a  single  object,  and  to  order  these  regions  such  that  the 
more  promising  ones  are  explored  first.  These  goals  of  selection  are  different  from 
those  of  segmentation,  where  the  problem  is  to  partition  the  image  into  regions 
that  contain  a  single  object.  In  selection,  on  the  other  hand,  it  is  not  essential  to 
isolate  regions  that  totally  contain  a  single  object,  nor  is  it  necessary  to  partition 
the  entire  image  into  different  object  containing  regions. 

Even  though  selection  can  be  of  help  in  recognition,  it  has  largely  remained 
unsolved.  What  makes  selection  so  difficult?  In  the  ideal  case,  if  the  appearance 
of  the  desired  object  in  the  scene  were  known,  and  objects  in  the  scene  were  nicely 
separated  and  distinguishable  from  the  background,  and  the  illumination  condi¬ 
tions  were  known,  then  even  simple  methods  that  rely  on  intensity  measurements 
would  work  well  to  extract  groups  of  features.  But  in  reality,  the  appearance  of  the 
object  is  not  known.  In  addition,  illumination  conditions  and  surface  geometries 
of  objects  present  in  a  scene  can  cause  problems  of  occlusion,  shadowing,  specu- 
larities,  and  inter-reflections  in  the  image  and  make  it  difficult  to  interpret  groups 
of  data  features  such  as  edges  and  lines.  Previous  approaches  to  selection  have 
focused  on  the  problem  of  data-driven  selection  by  grouping  data  features  such  as 
edges,  lines,  points,  or  based  on  constraints  such  as  parallelism,  or  collinearity,  [19], 
distance  and  orientation  [18],  and  regions  enclosed  by  a  group  of  edges  [6].  The 
extent  to  which  such  grouping  methods  reduce  the  search  in  recognition  depends 
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on  the  reliability  of  the  groups  produced  (i.e.  how  many  of  them  really  come  from 
a  single  object).  Maintaining  the  reliability  of  groups  was  found  to  be  difficult 
using  constraints  such  as  the  ones  listed  above.  So  the  general  problem  of  selec¬ 
tion  remains  largely  unsolved  as  it  is  still  not  obvious  how  to  reliably  characterize 
subsets  of  data  features  that  will  give  clues  that  point  to  a  single  object.  Thus 
it  appears  that  there  is  a  need  for  a  computational  model  of  selection  to  explain 
both  data  and  task-driven  selection. 

We  have  been  involved  in  building  one  such  model  that  proposes  that  selec¬ 
tion  be  accomplished  via  an  attention  mechanism.  Specifically,  it  is  an  attempt  to 
build  a  computational  model  of  the  visual  attention  phenomenon  in  humans,  and 
to  propose  it  as  a  selection  mechanism  for  recognition.  This  involves  the  isola¬ 
tion  of  two  modes  of  human  attentional  behavior,  namely  attracted-attention  and 
pay-attention  modes,  to  serve  as  paradigms  for  data-driven  and  model-driven  se¬ 
lection  respectively.  The  attracted-attention  mode  of  behavior  is  spontaneous  and 
is  commonly  exhibited  by  an  unbiased  observer  (i.e.,  with  no  a  priori  intentions) 
when  some  objects  or  some  aspects  of  the  scene  attract  his/her  attention.  The 
pay-attention  mode  is  a  more  deliberate  behavior  exhibited  by  an  observer  looking 
at  a  scene  with  a  priori  goals  (such  as  the  task  of  recognizing  an  object,  say)  and 
hence  paying  attention  to  only  those  objects/aspects  of  a  scene  that  are  relevant  to 
the  goal.  According  to  this  model,  therefore,  data-driven  selection  can  be  achieved 
by  identifjdng  regions  in  an  image  that  attract  attention  (i.e.,  that  are  distinctive) 
with  respect  to  some  feature  such  as  color  or  texture,  while  model-driven  selection 
can  be  achieved  by  paying  attention  to  the  model  features  (i.e.,  using  the  model 
features  to  decide  saliency  of  features  in  the  image).  While  it  is  understandable 
that  paying  attention  to  model  features  can  help  isolate  areas  in  the  image  that 
could  contain  subsets  of  data  features  that  are  likely  to  contain  a  single  object  (or 
the  specific  model  object  in  this  case),  it  is  not  immediately  apparent  how  locating 
salient  regions  can  help  in  serving  the  goals  of  selection.  Such  a  choice  is,  how¬ 
ever,  motivated  by  the  following  considerations.  First,  it  is  often  observed  that  an 
object  stands  out  in  a  scene  because  of  some  distinctive  features  that  are  usually 
localized  to  some  portion  of  the  object.  Therefore  isolating  distinctive  regions  is 
more  likely  to  point  to  a  single  object.  Secondly,  a  distinctive  region,  if  suitably 
found,  can  help  in  limiting  the  number  of  candidate  models  from  the  library  that 
can  potentially  match  the  given  data.  This  is  especially  true  if  only  a  few  models 
in  the  library  satisfy  the  features  that  made  the  data  region  distinctive.  Lastly,  it 
has  often  been  observed  that  the  first  objects  recognized  in  a  scene  are  those  that 
attract  an  observer’s  attention  [15].  Thus  ordering  the  regions  by  distinctiveness 
to  decide  which  objects  to  recognize  first  seems  to  be  in  keeping  with  this  obser¬ 
vation.  Finally,  a  number  of  other  approaches  have  also  suggested  that  selection, 
at  least  data-driven,  can  be  performed  based  on  some  measure  of  saliency,  such 
as  the  structural  saliency  of  curves  [29],  or  saliency  defined  by  local  differences  in 
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contrast,  color,  or  size  [8,  24,  28j. 

The  above  discussion  indicates  a  framework  in  which  data  and  model-driven 
selection  can  be  achieved.  But  how  can  salient  regions  be  found  in  the  image 
independent  of  the  model,  and  how  can  the  object  model  affect  the  choice  of 
regions?  The  purpose  of  this  paper  is  to  present  a  method  of  selection  by  restricting 
attention  to  one  particular  feature,  namely,  color.  It  shows  how  color  regions 
can  be  extracted  from  the  image  and  how  they  can  be  used  to  perform  data- 
driven  and  model-driven  selection.  To  give  a  flavor  for  the  ensuing  discussions. 
Figures  1-3  show  some  examples  of  the  results  of  data  and  model-driven  selection 
performed  by  our  system.  Figure  la  shows  an  image  of  a  realistic  indoor  scene  with 
shadows,  inter- reflections,  and  consisting  of  many  t)rpes  of  objects.  The  different 
color  regions  found  in  this  image  are  re-colored  and  shown  in  Figure  lb.  The 
four  most  salient  color  regions  found  are  shown  in  Figures  Ic-lf.  These  regions 
span  objects  in  the  scene  that  are  salient  in  color.  Figure  2-3  show  model-driven 
selection  using  color,  using  the  model  object  shown  in  Figure  2a  and  the  scene 
depicted  in  Figure  2c.  The  cluster  of  regions  found  to  best  satisfy  the  model  color 
region  description  using  our  algorithm  for  model-driven  selection  is  shown  in  Figure 
3d. 

The  rest  of  the  paper  discusses  how  this  kind  of  selection  can  be  achieved 
using  color.  It  is  organized  as  follows.  In  Section  2,  we  motivate  the  choice  of 
color  as  a  feature  to  study  selection,  and  outline  the  requirements  imposed  by 
selection  on  any  method  for  the  extraction  of  color  information.  Based  on  these 
guidelines,  an  approach  to  extracting  color  regions  is  presented  in  Section  3.  In 
Section  4,  a  measure  for  expressing  the  saliency  of  color  regions  is  presented  and 
its  effectiveness  for  data-driven  selection  is  examined.  Section  5  presents  a  way 
to  perform  model-driven  selection  based  on  the  color  regions.  Finally,  Section  6 
summarizes  our  approach  to  color-based  selection. 

2.  COLOR  IN  SELECTION 

2.1  Role  of  Color  in  Selection 

Color  is  known  to  be  a  strong  cue  in  attracting  an  observer’s  attention.  Hu¬ 
mans  often  also  use  color  information  to  search  for  specific  objects  in  a  scene.  It 
therefore  seems  natural  to  use  color  as  a  cue  for  performing  selection  in  computer 
vision.  But  the  strong  motivation  for  using  color  in  selection  comes  from  the  fact 
that  it  provides  region  information  and  that,  when  specified  appropriately,  it  can 
be  relatively  insensitive  to  variations  in  normal  illumination  conditions  and  appear¬ 
ances  of  objects  [31].  A  color  region  in  the  image  almost  always  comes  entirely 
from  a  single  object,  giving,  therefore,  more  reliable  groups  than  existing  grouping 
methods  and  this  can  be  useful  for  data-driven  selection.  Because  objects  tend  to 
show  color  constancy  under  most  illumination  conditions,  color  can  be  a  stable  cue 
for  most  poses  (appearances)  of  objects  in  scenes,  thus  making  it  also  suitable  for 
model- driven  selection. 


3 


2.2  Surface  Color,  Image  Color,  Perceptual  Color 

Although  color  is  useful  for  selection,  the  problem  of  specif3dng  the  perceived 
color  of  objects,  that  is,  the  color  perceived  by  humans  looking  at  an  image  of  the 
scene,  has  proven  to  be  difficult  in  computer  vision.  Several  artifacts  such  as  spec- 
ularities  (from  shiny  surfaces  in  the  scene),  inter- reflections,  shading  on  surfaces, 
and  shadowing  all  make  it  difficult  to  recover  the  actual  color  of  objects  in  the 
scene  from  the  image.  Existing  approaches  have  mainly  focused  on  the  problem  of 
color  constancy,  where  the  goal  was  to  extract  surface  color,  i.e.,  surface  reflectance 
properties  of  objects,  in  order  to  obtain  a  stable  perception  of  the  color  of  an  object 
under  varying  illumination  conditions.  As  this  problem  is  under-constrained,  most 
methods  make  some  assumptions  about  either  the  surface  being  imaged  [23],  or 
about  the  illumination  conditions  [25,  14,  11,  32],  or  both  [10].  Other  approaches 
also  exist  that  try  to  recover  image  color,  i.e.,  the  color  of  the  objects  as  they 
appear  under  the  present  illumination  conditions,  accounting  separately  for  arti¬ 
facts  such  as  specularities  on  shiny  surfaces  [22].  These  methods,  however,  cannot 
ensure  that  the  color  extracted  matches  the  perceived  color  of  regions. 

For  the  purposes  of  selection,  what  kind  of  color  information  should  be  ex¬ 
tracted  from  regions?  Is  recovering  image  color  sufficient  or  should  one  attempt 
to  recover  surface  color?  We  propose  that  for  both  data  and  model-driven  selec¬ 
tion,  it  is  sufficient  if  a  region  could  be  specified  by  its  perceived  color,  and  the 
effects  of  artifacts  such  as  specularities  could  be  separately  accounted  as  was  done 
by  image  color  recovery  methods.  Using  the  perceptual  color,  two  adjacent  color 
regions  would  be  distinguished  if  their  perceived  colors  were  different,  and  this  is 
sufficient  for  data-driven  selection.  Because  objects  tend  to  obey  color  constancy 
under  most  changes  in  illumination,  their  perceived  color  remains  more  or  less  the 
same  thus  making  it  sufficient  also  for  model-driven  selection.  But  can  perceptual 
color  be  quantified  at  all?  In  general,  several  effects  such  as  simultaneous  color 
contrast  and  color  filling,  have  been  known  to  influence  human  perception  of  color 
[34].  Fortunately,  (as  we  will  explain  later,)  these  factors  are  not  very  critical  for 
selection. 

2.3  Perceptual  Color  Specification  by  Categories 

In  this  section  we  present  a  method  for  specifying  the  perceptual  color  of  image 
regions  from  the  colors  of  their  constituent  pixels.  The  color  of  pixels  in  images 
is  described  by  a  triplet  <R,G,B>  (called  specific  color  henceforth),  representing 
the  components  of  image  intensity  at  that  point  along  three  wavelengths  (usually 
red,  green  and  blue  as  dominant  wavelengths  to  correspond  to  the  filters  used  in 
the  color  cameras).  When  all  possible  triples  are  mapped  into  a  3- dimensional 
color  space  with  axes  standing  for  the  pure  red,  green  and  blue  respectively,  we 
obtain  a  color  space  that  represents  the  entire  spectrum  of  computer  recordable 
colors.  Such  a  color  space  must,  therefore,  be  partitionable  into  subspaces  where 
the  color  remains  perceptually  the  same,  and  is  distinctly  different  from  that  of 
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neighboring  subspaces.  Such  subspaces  can  be  called  perceptual  color  categories. 
Now  each  pixel  in  the  image  maps  to  a  point  in  this  color  space,  and  hence  will  fall 
into  one  of  these  categories.  The  perceptual  color  of  this  pixel  can,  therefore,  be 
specified  by  this  color  category.  To  obtain  the  perceived  colors  of  regions  from  the 
perceptual  color  of  their  constituent  pixels,  we  observe  the  following.  Although 
the  individual  pixels  of  an  image  color  region  may  show  considerable  variation  in 
their  specific  colors,  the  overall  color  of  the  region  is  fairly  well-determined  by  the 
color  of  the  majority  of  pixels  (called  dominant  color  henceforth).  Therefore,  the 
perceived  color  of  a  region  can  be  specified  by  the  color  category  corresponding  to 
the  dominant  color  in  the  region. 

The  category-based  specification  of  perceptual  color  (of  pixels  or  regions)  is  a 
good  compromise  between  choosing  the  specific  color  (which  is  extremely  unstable 
with  respect  to  changes  in  illumination  conditions,  etc.)  and  surface  color  (whose 
recovery  is  hard).  Since  the  categories  indicate  the  perceptual  color,  they  have 
the  same  beneficial  effect  as  recovering  perceptual  color,  on  both  data  and  model- 
driven  selection,  such  as  giving  a  reliable  segmentation  of  image  into  color  regions, 
and  being  stable  under  changes  in  illumination  conditions.  In  addition,  since  the 
perceptual  categories  depend  on  the  color  space  and  are  independent  of  the  im¬ 
age,  they  can  be  found  in  advance  and  stored  in,  say,  a  look-up  table.  Finally, 
a  category-based  description  is  in  keeping  with  the  idea  of  perceptual  categoriza¬ 
tion  that  has  been  explored  extensively  through  psychophysical  studies  [4,  5,  27]. 
These  studies  concluded  that  although  humans  can  discriminate  between  several 
thousand  nuances  of  colors,  psychophysically,  we  seem  to  partition  the  color  space 
into  relatively  few  distinct  qualitative  color  sensations  or  categories  [30]. 

2.4  Categorization  of  Color  Space 

The  above  discussion  argued  for  the  viability  of  an  approach  that  recovers 
a  color  to  within  a  category.  Before  this  can  be  turned  into  a  computational 
method  of  color  recovery  one  needs  to  address  the  issue  of  how  such  categories 
may  be  found.  Previous  work  on  color  categorization  involved  experiments  of 
naming  the  color  using  a  limited  vocabulary,  or  identifying  colors  using  the  Munsell 
color  charts  [34].  But  for  computational  color  recovery,  we  need  a  way  to  convert 
the  camera  recordable  red,  green  and  blue  components  of  colors  into  computer 
recordable  perceptual  color  categories.  This  was  done  by  performing  some  rather 
informal  but  extensive  psychophysical  experiments  that  systematically  examined 
a  color  space  and  recorded  the  places  where  qualitative  color  changes  occur,  thus 
determining  the  number  of  distinct  color  categories  that  can  be  perceived.  For 
this,  the  hue-saturation-value  color  space  was  used  as  it  specifies  a  given  color 
in  terms  of  its  hue,  purity  and  brilliance  -  attributes  that  have  been  found  to 
give  a  perceptual  description  of  color  [20].  The  detsuls  of  these  experiments  are 
described  in  Appendix  A  and  will  not  be  elaborated  here,  except  to  mention  the 
following.  The  entire  spectrum  of  computer  recordable  colors  (2*^  colors)  was 
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quantized  into  7200  bins  corresponding  to  a  5  degree  resolution  in  hue,  and  10 
levels  of  quantization  of  saturation  and  intensity  values  (see  Figure  7).  The  color  in 
each  such  bin  was  then  observed  by  displaying  a  mondrian  (a  uniform  color  patch) 
of  that  color  on  a  monitor  screen  and  observing  it  under  dark  room  conditions  with 
appropriate  monitor  calibration.  From  our  studies,  we  found  about  220  different 
color  categories  were  sufficient  to  describe  the  color  space.  The  color  category 
information  was  then  summarized  in  a  color-look-up  table.  Although  it  is  true 
that  a  finer  level  of  quantization  would  have  yielded  more  categories,  a  smaller  set 
is  actually  more  usefril  since  it  gives  a  reasonably  coarse  description  of  the  color 
of  a  region  thus  allovring  it  to  remain  the  same  for  some  variations  in  imaging 
conditions.  In  fact,  by  the  above  method  we  can  also  determine  which  categories 
can  be  grouped  to  give  an  even  rougher  description  of  a  particular  hue.  This  was 
done  and  stored  in  a  category-look-up  table  to  be  indexed  using  the  color  categories 
given  by  the  color-look-up  table. 

3.  COLOR  REGION  SEGMENTATION 

The  previous  section  described  how  to  specify  the  color  of  regions,  after  they 
have  been  isolated.  But  the  more  crucial  problem  is  to  identify  these  regions. 
In  this  section,  we  show  that  the  perceptual  categorization  principle  can  be  used 
to  determine  which  pixels  can  be  grouped  to  form  regions  in  an  image.  If  each 
surface  in  the  scene  were  a  mondrian,  then  all  its  pixels  would  belong  to  a  single 
color  category,  so  that  by  grouping  spatially  close  pixels  belonging  to  a  category, 
the  desired  segmentation  of  the  image  can  be  obtained.  But  real  surfaces  being 
hardly  mondrians,  it  is  rare  that  pixels  of  a  region  from  such  surfaces  all  belong 
to  the  same  color  category.  They  could  show  considerable  variation  in  color  with 
bright  and  dark  pixels  intermixed,  and  with  possibly  spurious  pixels  also  being 
present.  We  now  analyse  some  of  the  color  variations  across  an  image  that  can 
result  from  imaging  a  colored  surface  in  the  scene. 

3.1  Variation  of  Color  Across  an  Image  of  a  3D-Surface 

In  this  section  we  use  some  assumptions  to  show  that  the  color  variations  across 
an  image  of  a  surface  is  mostly  in  intensity.  When  a  surface  is  imaged,  the  light 
falling  on  the  image  plane  (image  irradiance)  is  related  to  the  physical  properties 
of  the  scene  being  imaged  via  the  image  irradiance  equation: 

/(A,r)  =  p(A,r)F(k,n,s)E(A,r).  (1) 

where  A  is  the  wavelength,  r  is  the  spatial  coordinate  and  r  is  its  projection  in  the 
image,  E(A,  r)  is  the  intensity  of  the  ambient  illumination,  p{\,  r)  is  the  component 
of  surface  reflectance  that  depends  only  on  the  material  properties  of  the  surface 
(and  hence  specifles  its  surface  color),  while  F(k,  n,  s)  is  the  component  of  surface 
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reflectivity  that  depends  on  surface  geometry,  with  k,s,n  being  the  viewer  direc¬ 
tion,  the  source  direction  and  the  surface  normal  respectively.  Although  the  image 
irradiance  equation  assumes  that  all  surfaces  in  a  scene  reflect  light  governed  by 
a  single  reflectivity  function,  we  can  easily  reinterpret  this  equation  to  represent 
image  irradiance  of  a  single  surface.  Under  the  assumption  of  a  single  light  source, 
the  surface  illumination  E(A,  r)  can  be  separated  as  a  product  of  two  terms  Ei(A) 
and  E2(r),  and  since  F(k,  n  s)  is  a  function  of  position  r  it  can  be  expressed  as 
.F(r).  Then  the  image  irradiance  equation  can  be  re-written  as 

/(A,r)  =  p(A,r):F(r)E2(A)f;2(r).  (2) 

The  surface  reflectance  and  hence  the  resulting  appearance  of  a  surface  is  de¬ 
termined  by  the  composition  as  well  as  the  concentration  of  the  pigments  of  the 
material  constituting  the  surface.  For  most  surfaces,  the  composition  of  the  pig¬ 
ments  can  be  considered  independent  of  their  concentration  so  that  the  spectral 
reflectance  /)(A,r)  can  be  written  as  a  product  of  two  terms  pi(A)  and  p2(r).  Note 
that  this  assumption  is  less  restricting  than  the  assumption  of  homogeneity  that 
has  been  used  before  [14].  With  this  simplification,  (and  grouping  the  product  of 
terms  dependent  on  A  and  r  separately)  the  image  irradiance  equation  becomes 

/(A,r)  =  fr(r)I(A).  (3) 

Now,  if  we  consider  the  filtered  version  of  this  signal,  i.e.,  the  image  irradiance  in 
three  channels,  say  the  red,  green  and  blue  channels  with  their  associated  transfer 
functions  h/i(A),  hai^),  hg{X),  the  specific  color  at  each  pixel  location  r  is  specified 
by  the  triple  <R(r),G(r),B(r)>  where 


R(r)  = 

/»/(A,r)MA)dA 

=  H{T)S^L{X)hgiX)dX 

=  H{T)Rr 

(4) 

G{r)  = 

I{X,r)hG{\)dX 

=  H{T)S^L{X)hGiX)dX 

=  H{T)G^ 

(5) 

B{r)  = 

I{X,r)hB{X)dX 

=  H{t)  L{X)hBiX)dX 

=  . 

(6) 

This  shows  that  under  the  given  assumptions  (which  include  non-homogeneous 
surfaces,)  the  color  of  a  surface  can  vary  only  in  intensity.  In  practice,  even  when 
the  separability  assumption  on  reflectance  is  not  satisfied,  or  there  is  more  than 
one  light  source  in  the  scene,  the  general  observation  is  that  the  intensity  and 
purity  of  colors  are  affected,  but  the  hue  still  remains  fairly  constant.  In  terms 
of  categories,  this  means  that  different  pixels  in  a  surface  belong  to  compatible 
categories,  i.e.  have  the  same  overall  hue  but  vary  in  intensity  and  saturation. 
Conversely,  if  we  group  pixels  belonging  to  a  single  category,  then  each  physical 
surface  is  spanned  by  multiple  overlapping  regions  belonging  to  such  compatible 
color  categories.  These  were  the  categories  that  were  grouped  in  the  category-look- 
up-table  mentioned  in  Section  2.4.  The  next  section  describes  how  these  concepts 
can  be  put  together  to  give  a  color  image  segmentation  algorithm. 
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3.2  Color  Region  Segmentation  Algorithm 

The  algorithm  for  color  image  segmentation  performs  the  following  steps.  (1) 
First,  it  maps  all  pixels  to  their  categories  in  color  space,  (2)  It  then  groups  pixels 
belonging  to  the  same  category,  (3)  and  finally  merges  overlapping  regions  in  the 
image  that  are  of  compatible  color  categories. 

1.  Mapping  pixels  to  categories:  This  is  done  by  a  simple  indexing  of  the  color- 
look-up-table  by  the  color  of  the  pixel  specified  in  terms  of  its  hue,  saturation,  and 
brightness  components.  These  components  can  be  derived  from  the  specific  color 
as  described  in  [9].  This  step  takes  time  =  0(N)  where  N  is  the  size  of  the  image. 

2.  Grouping  pixels  of  same  category;  The  image  is  divided  into  small  non-overlapping 
bins  of  fixed  size  (,  say,  8x8)  and  the  color  categories  found  in  the  bins  are  recorded. 
The  size  of  the  bin  can  be  chosen  based  on  expectations  about  the  average  size 

of  color  regions  found  in  natural  scenes.  Each  bin  thus  has  a  list  of  color  cate¬ 
gories  summarizing  the  pixel  color  information  in  the  bin.  Neighboring  bins  that 
contain  a  common  color  category  can  be  grouped  to  give  a  connected  component 
representing  an  image  region  of  that  color  category.  Since  a  bin  has  several  color 
categories,  it  belongs  to  several  connected  components  that  overlap.  The  actual 
grouping  algorithm  we  used  is  a  sequential  non-recursive  labeling  algorithm  that 
simultaneously  assembles  all  the  overlapping  connected  components  using  the  cate¬ 
gory  description  in  the  bins.  This  algorithm  is  an  extended  version  of  the  labeling 
algorithm  for  binary  images  described  earlier  [13],  and  uses  the  union-find  data 
structure  to  efficiently  merge  category  labels  into  connected  components  taking 
time  =  0(k^  M)  where  M  =  number  of  windows,  and  k  =  maximum  number  of 
categories  present  in  the  window  (=  0(1)  for  small  window-sizes,  eg.,  8  x  8).  The 
resulting  labels  are  propagated  back  to  the  pixels  to  give  the  precise  boundaries 
of  color  regions  of  single  color  categories.  The  color  of  the  region  is  then  specified 
by  the  color  category  and  specific  color  that  is  the  dominant  color  in  the  region  as 
described  in  Section  2,3. 

3.  Merging  overlapping  regions;  The  general  problem  of  determining  which  regions 
overlap  in  the  image  can  be  a  computationally  intensive  operation  as  it  involves 
determining  which  polygonal  regions  intersect  and  finding  their  regions  of  intersec¬ 
tion.  But  by  using  the  bin-wise  representation  of  connected  components,  we  can 
detect  and  combine  overlapping  regions  with  greater  ease.  From  the  discussion 
in  Section  3.1,  a  shaded  region  maps  to  categories  in  color  space  that  are  com¬ 
patible,  i.e.,  h<  ve  the  same  overall  hue.  The  categories  that  are  compatible  are 
available  from  the  category  look-up-table  described  in  Section  2.4.  To  find  all  such 
regions  that  have  compatible  categories  and  overlap  in  image  space,  the  algorithm 
examines  each  window  of  the  image  to  see  if  it  contains  the  interior  portions  of 
regions  of  compatible  color  categories.  Such  overlap  regions  are  grouped  as  in  step 
2.  This  step  again  takes  0(k*M)  time.  Finally,  the  window-level  color  labels  are 
propagated  back  to  the  corresponding  pixels  to  give  an  accurate  localization  of  the 
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color  region  boundaries. 

The  algorithm  for  color  image  segmentation  thus  makes  only  a  constant  number 
of  passes  through  the  image,  each  being  linear  in  the  size  of  the  image. 

3.3  Handling  Specularities 

The  above  algorithm  segments  the  image  into  regions  according  to  their  per¬ 
ceived  color.  As  we  described  before,  this  is  sufficient  for  data-driven  selection. 
But  for  model-driven  selection  such  a  description  needs  to  be  augmented  with  the 
knowledge  of  artifacts  that  occur  in  the  image  such  as  specularities,  shadows,  or 
inter- reflections.  Such  artifacts  can  cause  a  model  region  to  appear  fragmented. 
For  example,  a  sharp  streak  of  specularity  on  the  surface  can  cleave  its  image  into 
two  regions.  If  these  artifacts  could  be  identified  and  corrected,  this  can  improve 
the  effectiveness  of  a  color-based  model-driven  selection  system.  We  now  discuss 
how  one  of  these  artifacts,  namely,  specularities,  can  be  handled  once  the  color 
regions  have  been  isolated.  Specularities  are  present  in  regions  produced  by  ob¬ 
jects  in  the  scene  having  shiny  surfaces,  such  as  metallic  objects  and  dielectrics. 
These  specularities  have  a  central  bright  portion  that  appears  white  in  most  il¬ 
lumination  conditions  (bright  sunlight,  day  light,  tube  light)  and  tapers  off  near 
the  specularity  boundary  merging  into  the  rest  of  the  body  color.  Such  specular 
regions  and  their  adjacent  colored  regions  when  projected  into  a  color  space  form 
characteristic  clusters  such  as  the  skewed  T  described  in  [21].  These  clusters  can, 
therefore,  be  analysed  to  detect  and  remove  highlights  using  the  method  described 
in  that  paper. 

3.4  Results 

Figures  4-6  demonstrate  the  color  region  segmentation  algorithm.  Figure  4a 
shows  a  256  x  256  pixel  size  image  of  a  color  pattern  on  a  plastic  bag.  The  folding 
on  the  bag  and  its  plastic  material  together  give  a  glossy  appearance  in  the  image 
as  can  be  seen  in  the  big  S  and  Y.  The  result  of  step-2  of  the  algorithm  is  shown 
in  Figure  4b,  and  there  it  can  be  seen  that  the  glossy  portions  on  the  big  blue  Y 
and  the  red  S  cause  overlapping  color  regions.  These  are  merged  in  step  3  and  the 
result  is  shown  in  Figure  4c.  As  can  be  seen  in  the  figure,  the  algorithm  achieves 
a  fairly  good  segmentation  of  the  scene  for  such  surfaces.  Figure  5  shows  another 
image  consisting  of  colored  pieces  of  cloth  with  the  textured  region  having  several 
small  colored  regions  within  it.  The  results  of  the  algorithm  (Figure  5c)  show  that 
even  such  colored  regions  can  be  reliably  isolated.  Another  example  (Figure  1)  of 
color  region  extraction  was  mentioned  earlier  in  Section  1.  Notice  in  the  segmented 
image  of  Figure  lb  that  adjacent  objects  of  the  same  perceptual  color  are  merged 
(grey  books).  This  is  to  be  expected  because  the  grouping  of  regions  is  based  on 
color  information  alone. 
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4.  COLOR-BASED  DATA-DRIVEN  SELECTION 


The  segmentation  algorithm  described  above  gives  a  large  number  of  color 
regions.  Some  of  these  may  span  more  than  one  object,  while  some  come  from  the 
scene  clutter  rather  than  objects  of  interest  in  the  scene.  It  would  be  useful  for  the 
purposes  of  recognition  to  order  and  consider  only  some  of  these  regions  so  that  by 
isolating  data  subsets  from  such  regions,  the  search  can  be  focused  on  key  groups 
of  features  thus  excluding  much  of  the  scene  clutter.  Based  on  the  rationale  given 
in  Section  1,  we  propose  that  the  color  regions  be  ordered  by  their  saliency,  i.e., 
by  how  distinctive  they  appear.  The  method  of  color-based  selection,  therefore,  is 
to  extract  color  regions  from  the  image,  order  them  based  on  a  measure  of  color- 
saliency  and  then  select  a  few  most  salient  regions  to  be  given  to  any  recognition 
system.  In  this  section  we  first  describe  a  measure  of  expressing  color  saliency,  and 
then  examine  the  utility  of  salient-region  selection  in  recognition. 

4.1  Finding  Salient  Color  Regions  in  Images 

In  trying  to  express  distinctiveness,  one  encounters  the  question:  Is  distinctive¬ 
ness  expressible  at  aU?  In  general,  any  judgement  of  distinctiveness  has  both  a 
sensory  and  a  subjective  component.  Thus  for  example,  while  most  of  us  can  per¬ 
ceive  brighter  colors  more  easily  than  duller  colors,  the  judgement  of  which  of  two 
hues  of  the  same  brightness  and  saturation  are  more  salient  can  be  subjective.  The 
aim  here  is  to  focus  on  the  sensory  component  of  distinctiveness  and  hence  extract 
properties  of  regions  that  are  general  enough  to  be  perceived  by  most  observers. 
Accordingly,  we  propose  that  the  saliency  of  a  color  region  be  composed  of  two 
components,  namely,  self-saliency  and  relative  saliency.  Self-saliency  determines 
how  conspicuous  a  region  is  on  its  own  and  measures  some  intrinsic  properties 
of  the  region,  while  relative  saliency  measures  how  distinctive  the  region  appears 
when  there  are  regions  of  competing  distinctiveness  in  the  neighborhood. 

In  order  to  develop  such  a  measure  for  color-region  saliency  one  has  to  ask 
the  following  questions:  What  features  in  regions  determine  their  saliency?  How 
can  they  be  measured  to  reflect  our  sensory  judgments?  Finally,  how  can  they 
be  combined  to  give  the  saliency  measure?  We  now  address  these  questions  and 
derive  a  measure  of  color-saliency. 

4.1.1  Features  used  for  measuring  self  and  relative  saliency 

Since  the  saliency  of  a  color  region  depends  on  the  region  features  used,  they 
must  be  carefully  selected.  Such  features  should  be:  (i)  perceptually  important, 
(ii)  easily  measurable,  and  (iii)  fairly  general,  to  avoid  subjective  bias. 

1.  Color:  The  color  of  a  region  is  an  intrinsic  property  and  affects  a  region’s  self- 
saliency.  It  is  specified  by  (s(R),v(R)),  where  s(R)  =  saturation  or  purity  of  the 
color  of  region  R,  and  v(R)  =  brightness,  and  0  <  s(R),v(R)  <  1.0.  The  hue  of 
colors  is  not  considered,  to  avoid  subjective  bias. 
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2.  Region  size:  The  size  of  a  region  is  again  an  intrinsic  property  and  affects  its 
self-saliency.  It  is  chosen  as  a  feature  based  on  the  observation  that  regions  that 
are  either  very  small  in  extent,  or  that  are  large  enough  to  cover  the  entire  field  of 
view,  do  not  often  attract  our  attention.  Also,  very  large  regions  can  potentially 
span  more  than  one  object,  making  them  unsuitable  for  selection.  The  size  feature 
is  expressed  by  the  normalized  size  r(R)  =  Size(R)/Image-size. 

3.  Color  contrast:  The  color  contrast  a  region  shows  with  its  neighbors  affects 
its  relative- saliency.  The  rationale  behind  choosing  color  contrast  is  that  even  if  a 
region  has  an  interesting  intrinsic  color,  it  may  not  be  distinctive  if  eiU  its  neighbors 
also  have  equally  interesting  colors,  unless  it  shows  the  greatest  contrast.  It  is 
difficult  to  express  color  contrast  in  a  numerical  measure  that  can  account  for 
the  variations  in  an  observer’s  judgement  with  the  conditions  of  observation,  size, 
shape,  and  absolute  color  of  the  stimuli  [34].  In  the  color  contrast  measure  we 
chose,  we  augmented  an  empirical  color  difference  formula  to  predict  the  observed 
color  differences,  with  the  knowledge  of  the  hues  of  the  colors  derived  from  their 
categorical  representation.  Specifically,  the  following  difference  formula  d(Cij,Cr) 
was  used  to  measure  color  difference  between  two  color  region  R  and  T  with  specific 
colors  as  C^j  =  (rcgo^bo)^  and  Cy  =  {r,g,b)^  as: 


diCR,CT)  =  Ji 


ro 


:P  +  ( 


9o 


ro  +  go  +  bo  r  +  g-\-b  ro  +  go  +  bo  r  +  g  +  b 


y  (7) 


As  this  measure  does  not  explicitly  take  into  account  the  hues  of  the  colors,  the 
color  category-based  representation  is  used  to  ascertain  whether  the  hues  of  the  two 
regions  are  different,  and  then  the  extent  of  difference  is  judged  using  d{CR,  Ct)  in 
such  a  way  that  the  contrast  between  regions  of  different  hue  is  emphasized.  This 
allows  the  measure  to  handle  simultaneous  color  contrast  to  some  extent.  The 
measure  is  given  by  c(R,T)  below: 


{ffid{CR,  Ct)  if  R  and  T  are  of  same  hue 
kj  +  k\d{CR,CT)  otherwise 


(8) 


where  ki  =  ^  and  kj  =  0.5,  so  that  0  <  c(R,T)  <  1.0. 

4.  Size  contrast:  The  size  contrast  is  a  feature  for  determining  relative  saliency  and 
is  chosen  because  it  determines  if  a  region  is  mostly  in  the  background  or  in  the 
foreground.  The  size  contrast  of  a  region  R  with  respect  to  an  adjacent  region  T 
is  simply  the  relative  size  (area)  and  is  given  by 


t{R,T) 


/size(R)  size(T)'\ 
\size(T)’  size(R) / 


(9) 


Since  a  region  R  has  several  neighboring  regions  in  general,  the  color  contrast 
c{R)  and  size  contrast  t{R)  of  a  region  R  are  measured  relative  to  a  best  neighbor 
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The$t  for  each  region,  so  that  c{R)  =  and  t{R)  =  t{R,Tbe,t).  Ti^^t  is  the 

neighboring  region  that  is  ranked  the  highest  when  all  neighbors  are  sorted  first 
by  size,  then  by  extent  of  surround,  and  finally  by  contrast  (size  or  color  contrast 
as  the  case  may  be). 


4.1.2  Combining  features  for  self-saliency:  To  determine  self-saliency  from 
the  chosen  features,  they  are  weighted  appropriately  to  reflect  their  importance. 
The  self-saliency  measure  chosen  emphasizes  purer  and  brighter  colors  over  darker 
and  duller  colors  by  choosing  the  weighting  functions  for  saturation  and  brightness 
as  /i(s(R))  =  0.5s(R),  and  /2(v(R))  =  0.5v(R)  respectively.  The  size  of  a  region  is 
given  a  non-linear  weight  to  deemphasize  both  very  small  and  very  large  regions  as 
they  do  not  often  attract  our  attention.  The  corresponding  weighting  function  has 
sharp  2is  well  as  smoothly  rising  and  falling  phases  determined  by  the  breakpoints 
as  shown  in  Figure  8a  and  the  equation  below.^  Here  n  stands  for  the 
region  size  r(R). 


■  0  <  n  < 

1  —  ti  <  n  <  t2 

<  S2  —  Czln^l  —  n  +  ti)  <  n  <  is 

^gg-c4(n-ej)  h  <n<t^ 

0  t4  <  n  <  1.0 


(10) 


where  ti  =  0.1,  tj  =  0.4,  ta  =  0.5,  t4  =  0.75,  si  =  0.8,  sa  =  1.0,  S3  =  0.7,  S4  =  10  ^ 

In^ 


and  Cl 


region  R  =  r(R). 


4.1.3  Combining  features  for  relative  saliency: 

Once  again,  the  chosen  features  are  weighted  appropriately  to  determine  rela¬ 
tive  saliency.  The  color  contrast  is  weighted  linearly  by  a  function  /4(c(R))  =  c(R), 
to  emphasize  regions  showing  greater  contrast.  The  relative  size  is  exponentially 
weighted  by  a  function  fi{t{R))  =  1  —  e“*^*f*l  to  favor  situations  in  which  a  region 
and  its  best  neighbor  have  approximately  the  same  size.^ 


4.1.4  Finding  self  and  relative  saliency 

Once  the  various  features  determining  seif  and  relative  saliency  are  appropri¬ 
ately  weighted,  they  reinforce  each  other  so  that  the  self  and  relative  saliencies 
can  be  given  by  simple  additive  combinations  of  their  individual  features.  The 
self-saliency  of  a  region  R  denoted  by  SS(R)  is  given  as  /i(s(R))  -|-  /2(v(R))  -f 

^Sttch  a  function  along  with  the  thresholds  and  rates  of  change  was  empirically  derived  from 
informal  psychophysical  experiments  (whose  details  will  not  be  elaborated  here)  performed  using 
color  regions  of  various  sises. 

^Once  again  this  function  was  obtained  by  performing  informal  psychophysical  experiments. 
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/3(r(R)).  Similarly,  the  relative  saliency  of  the  region  R,  RS(R)  is  given  by  /4(c(R)) 
+  /5(t(R)).  Finally,  the  overall  saliency  of  a  region  R  is  expressed  by  a  linear  combi¬ 
nation  of  self  and  relative  saliency  as  SS(R)  -I-  RS(R),  using  the  following  rationale. 
Any  combination  method  should  be  flexible  enough  to  allow  a  region  to  be  declared 
salient  if  it  shows  good  contrast  (i.e.,  high  relative  saliency)  even  though  it  may 
not  be  interesting  on  its  own.  Conversely,  a  region  that  is  interesting  on  its  own 
but  fails  to  become  interesting  in  the  presence  of  neighboring  regions  should  not 
be  chosen.  On  the  basis  of  these  observations  alone,  nonlinear  combining  methods 
such  as  (S5(R)  *  R5(R))  or  max(55(R),  ilS(R))  are  not  suitable.  If  a  region  is 
both  interesting  on  its  own  as  well  as  in  the  presence  of  other  regions  in  the  scene, 
then  it  must  be  given  more  importance.  AU  three  criteria  are  satisfied  when  the 
two  saliency  components  are  linearly  combined.  The  color  saliency  of  a  region  R 
is  therefore  given  by 

Color-saliency(R)  =  fi{s{R))  -t-  fiiviR))  -f  f3{r{R))  +  /4(c(R))  -t-  fi{t{R)).  (11) 

The  saliency  measure  described  above  does  not  completely  take  into  account 
all  the  perceptual  effects  of  simultaneous  color  contrast,  color-filling,  etc.  Because 
such  effects  do  not  greatly  undermine  a  region  that  is  already  very  outstanding 
(very  salient),  and  because  saliency  is  being  used  to  rank  the  regions,  we  have 
ignored  these  effects. 

The  color  regions  in  the  image  can  now  be  ordered  using  the  saliency  measure 
and  a  few  most  significant  regions  can  be  retained  for  selection  (called  salient  re¬ 
gions,  henceforth).  The  number  of  salient  regions  to  be  retained  can  be  determined 
when  the  selection  mechanism  is  integrated  with  a  recognition  system  to  perform 
a  specific  task,  and  is  therefore  left  unspecified  here. 

4.1.5  Results 

We  now  illustrate  the  ranking  of  regions  produced  by  the  color  saliency  measure 
derived  above.  Figures  Ic-lf  show  the  four  most  distinctive  regions  found  by 
applying  the  color-saliency  measure  to  all  the  color  regions  extracted  from  the 
scene  shown  in  Figure  la.  Figures  4d-4f,  5d-5f,  6c-6f,  show  the  few  most  salient 
regions  found  in  their  respective  scenes.  In  the  experiments  done  so  far,  the  color- 
saliency  measure  was  found  to  select  fairly  large  bright-colored  regions  that  showed 
good  contrast  with  their  neighbors,  and  appeared  perceptually  significant. 

4.2  Use  of  Salient  Color-based  Selection  in  Recognition 

Data-driven  selection  based  on  salient  color  regions  is  primarily  useful  when 
the  object  of  interest  has  at  least  one  of  its  regions  appearing  salient  in  the  given 
scene.  In  such  cases,  the  search  for  data  features  that  match  model  features  can 
be  restricted  to  the  salient  regions,  thus  avoiding  needless  search  in  other  areas  of 
the  image.  By  selecting  salient  color  regions,  we  obtain  a  small  number  of  groups 
(a  region  is  itself  a  group),  containing  several  features.  It  was  shown  in  [7]  that 
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such  large-sized  groups  are  useful  for  indexing,  i.e.,  to  determine  which  regions 
from  models  in  a  library  could  correspond  to  a  given  group.  But  when  the  task 
is  to  recognize  a  single  object,  it  is  desirable  to  have  small-sized  groups.  For  this, 
existing  grouping  techniques  can  be  applied  to  the  data  features  found  within  the 
color  regions  to  obtain  reliable  small-sized  groups. 

We  now  estimate  the  search  reduction  that  can  be  achieved  with  such  a  selection 
mechanism.  Let  (M,N)  =  total  number  of  features  (such  as  edges,  lines,  etc.)  in 
the  model  and  image  respectively.  Let  =  total  number  of  color  regions 

in  the  model  and  image  respectively.  Let  Ns  =  number  of  salient  regions  that  are 
retained  in  an  image.  Let  g  =  average  size  of  a  group  of  data  features,  within  a 
model  or  image.  Let  {Gm,Gs)  =  number  of  groups  formed  (using  any  existing 
grouping  scheme)  in  the  model  and  image  respectively.  Finally,  let  Gsi  be  the 
number  of  groups  in  the  salient  image  region  i.  Using  the  alignment  method  of 
recognition  [16],  at  least  three  corresponding  data  features  are  needed  to  solve 
for  the  pose  (appearance)  of  the  model  of  a  rigid  object  in  the  image.  If  no 
selection  of  the  data  features  is  done,  then  the  brute-force  search  required  to  try 
all  possible  triples  is  0{M^N^).  If  selection  is  done  by  only  grouping  methods 
(i.e.,  without  color  region  selection),  then  the  number  of  matches  that  need  to  be 
tried  is  0{GMGffg^g^)  since  only  triples  within  groups  need  to  be  tried.  But  as  we 
mentioned  before,  grouping  methods  often  make  mistakes,  so  that  not  all  groups 
contain  features  belonging  to  a  single  object.  In  at  least  one  such  study  [6]  out  of 
the  150  or  so  groups  isolated,  about  83  groups  actually  came  from  single  objects. 
Most  of  the  remaining  67  groups  would  not  yield  any  consistent  match  and  would 
represent  friiitless  search.  Consider  the  case  when  grouping  of  data  features  is  done 
within  all  the  color  regions.  With  this,  the  grouping  is  more  reliable,  and  also,  the 
number  of  groups  is  smaller  (sis  groups  straddling  regions  are  not  considered), 
so  that  the  oversJl  effect  is  to  reduce  the  search.  For  example,  with  M  =  200, 
N  =  3000,  g  =  7,  and  Gu  =  30,  Gs  =  430  (these  numbers  are  typical  of  indoor 
scenes),  the  search  reduction  assuming  70%  reliability  in  simple  grouping  to  >  95% 
reliability  in  grouping  within  color  regions  is  m  0.25  *  10*  which  is  a  considerable 
improvement.  Consider  next,  when  grouping  is  restricted  to  salient  color  regions. 
The  number  of  matches  further  reduces  to  GsjGjug^g^)}  since  only  the 

groups  in  the  salient  regions  need  be  tried. 

To  obtain  an  estimate  of  the  number  of  matches  and  time  taken  for  matching  in 
real  scenes  when  color-based  selection  is  used,  we  recorded  the  number  of  regions 
(obtained  by  applying  the  segmentation  algorithm  of  Section  3),  and  the  number 
of  data  features  within  regions  in  some  selected  models  and  scenes  (Figures  1  and 
2  show  typical  examples  of  models  and  scenes  tried).  The  regions  were  ordered 
using  the  color  saliency  measure  and  the  four  most  ssdient  regions  were  retained. 
Then  search  estimates  were  obtained  using  the  above  formulas,  and  assuming  a 
grouping  scheme  that  gives  a  number  of  groups  within  regions  that  is  bounded 
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,  the  number  of  features  in  a  region  m.-  •  j  i.  j  xl  l  r 

by - ■ - ttt — ^ “ — ' — .  Inis  is  a  eood  bound  on  the  number  of 

^  average  size  of  the  groups  in  a  region  ® 

groups  produced  using  simple  grouping  schemes  such  as  grouping  'g'  closely-spaced 
parallel  lines  in  the  region.  The  result  of  such  studies  is  shown  in  Table  I.  As  can 
be  seen  from  this  table,  the  number  of  matches  is  always  smaller  when  salient 
color  regions  are  used  for  selection.  But  the  ultimate  utility  of  such  a  selection 
mechanism  can  be  accurately  gauged  only  after  it  is  integrated  with  a  recognition 
system.  Current  research  is  being  directed  towards  this  effort. 


5.  COLOR-BASED  MODEL-DRIVEN  SELECTION 


The  previous  section  described  a  data-driven  selection  mechanism  that  was 
meant  for  an  object  of  interest  having  some  salient  color  regions.  This  will  not 
be  of  much  help  when  the  object  of  interest  is  not  salient  in  color  (but  salient 
in  some  other  domain,  say  texture)  or  is  not  salient  at  all.  In  such  cases,  the 
color  description  of  the  model  can  be  used  to  perform  selection.  We  now  describe 
one  such  color-based  model-driven  selection  mechanism.  Here,  given  a  color-based 
description  of  a  model  object,  the  task  is  to  locate  color  regions  that  satisfy  this 
description.  The  use  of  model  information  to  constrain  the  matching  of  model 
features  to  image  features  is  not  new.  Several  model-driven  search  restriction 
techniques  such  as  generalized  Hough  transforms  [17],  heuristic  termination  [12], 
and  focal  features  have  evolved  [2,  1,  3].  The  emphasis  in  these  methods  was  on 
geometric  constraints  that  can  prune  the  search  space  during  the  matching  stage 
of  recognition.  The  approach  we  present  here,  on  the  other  hand,  emphasizes 
some  global  relational  information  about  model  color  regions  to  prune  the  search 
space  prior  to  matching.  It  also  provides  possible  correspondences  between  model 
and  image  regions.  Such  a  correspondence  can  further  reduce  the  complexity  of 
recognition  because  the  search  for  pairing  model  features  to  data  features  can  be 
restricted  now  to  these  corresponding  regions  rather  than  all  image  regions.  Color 
information  in  the  model  object  has  been  used  before  to  search  for  instances  of 
the  object  in  the  given  image  of  a  scene  [31,  33].  These  approaches  represent 
model  and  image  color  information  by  color  histograms  and  perform  a  match  of 
the  histograms.  Such  approaches  usually  cause  a  lot  of  false  positive  identifications, 
and  do  not  explicitly  address  some  of  the  problems  that  arise  in  going  from  a  model 
object  to  its  instance  in  a  scene.  Also,  since  they  do  not  supply  correspondence 
between  model  and  image  regions,  they  are  not  as  useful  for  reducing  the  search 
in  recognition. 

In  order  for  any  scheme  for  model-driven  selection  to  be  effective  for  reducing 
the  search  in  recognition,  it  must  meet  two  requirements:  (i)  it  must  be  sufficiently 
selective  to  avoid  many  false  positive  identifications  that  cause  needless  search  for 
matches,  and  (ii)  it  must  be  sufficiently  conservative  to  avoid  many  false  negatives, 
causing  recognition  to  fail  when  it  should  have  succeeded.  A  selection  scheme 
can  make  false  negatives  if  it  does  not  adequately  take  into  account  the  various 
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problems  that  arise  in  going  from  a  model  object  to  its  image  in  the  scene.  An 
object  may  not  appear  the  same  in  the  scene  as  it  does  in  the  model,  because  it  has 
undergone  pose  changes,  or  because  it  is  occluded,  or  its  colors  appear  different 
in  the  current  illumination  conditions.  In  addition,  artifacts  such  as  specularities, 
inter- reflections,  and  shadows  may  also  cause  changes  in  the  appearance  of  the 
object.  So  how  can  a  model- driven  selection  mechanism  meet  these  two  apparently 
conflicting  requirements?  We  now  describe  an  approach  to  model-driven  selection 
that  meets  some  of  these  requirements.  It  makes  a  particular  choice  of  model 
description  and  assumes  that  this  is  made  available  to  it  for  selection.  Since  this 
model  description  affects  the  way  our  approach  formulates  the  color-based  model- 
driven  selection  problem,  it  is  described  first. 

5.1  Model  Description 

The  color  region  information  in  the  modeP  (in  an  image  or  view  of  the  model, 
that  is)  is  represented  as  a  region  adjacency  graph  (RAG) 

Mg  —<  Vm,  Em^Cmy  Rmt  Sm^  Brmy  >  (12) 

where  =  color  regions  in  the  model,  =  adjacencies  between  color  regions, 
C„(u)  =  color  of  region  u  €  V„,,  R„,(u,v)  =  relative  size  of  region  ’v’  with  respect 
to  region  u.  Sm(u)  =  size  of  region  u,  and  Bm  =  ^  bound  on  the  relative  size  of 
regions  given  by  Rm,  and  Btm  =  a  bound  on  the  absolute  size  of  regions  given  by 
5,„. 

The  above  description  exploits  features  of  regions  that  tend  to  remain  more  or 
less  invariant  in  most  scenes  where  the  model  appears.  If  the  color  of  a  model  region 
is  specified  by  its  color  category,  then  as  we  discussed  before,  it  tends  to  remain 
relatively  stable  (or  changes  in  a  predictable  way)  under  variations  in  illumination 
conditions,  and  pose  changes.  Similarly,  the  adjacency  information  between  two 
color  regions  tends  to  remain  more  or  less  invariant  in  the  different  appearances  of 
the  object,  as  long  as  the  two  regions  are  visible  in  the  given  image  and  there  are  no 
occlusions.  Finally,  the  relative  size  of  regions  is  preserved  under  changes  of  scale. 
But  it  can  undergo  considerable  changes  if  the  pose  of  the  object  changes,  say 
when  a  region  goes  partially  out  of  view.  The  bound  on  the  relative  size  changes 
in  each  pair  of  adjacent  region,  Brm  indicates  the  extent  of  pose  changes  that  a 
selection  mechanism  is  expected  to  tolerate.  Relative  size  changes  can  also  occur 
due  to  occlusions.  By  placing  some  loose  bounds  on  the  absolute  size  changes  as 
given  by  B^,  the  model  description  restricts  the  changes  that  can  be  tolerated  in 
the  presence  of  occlusions.  For  size  changes  in  a  region  that  go  beyond  the  bounds, 

’The  model  description  specifies  a  color  view,  that  is,  a  range  of  2D  views  of  the  model  in 
which  one  or  more  of  the  color  regions  described  in  the  model  are  visible.  If  the  model  has  some 
views  showing  an  entirely  different  set  of  color  regions,  then  they  must  be  specified  as  separate 
color  views. 


16 


that  region  will  be  considered  no  longer  recognizable,  and  then  the  selection  will 
have  to  depend  on  the  evidence  for  other  model  regions  in  the  image. 

This  description  is  fairly  rich  and  has  some  structural  information  about  color 
regions  that  can  be  used  to  restrict  the  number  of  false  positives,  and  some  con¬ 
straints  on  the  relative  and  absolute  size  changes  that  can  be  used  to  restrict  the 
number  of  false  negatives  made  by  the  selection  mechanism. 

Finally,  the  model  description  gives  a  way  to  analogously  organize  the  color 
region  information  in  the  image  as  an  image  region  adjacency  graph  as  /g  =  < 
Vi,Ei,Ci,Ri,Si  >,  where  each  term  has  a  meaning  analogous  to  <  Vm,  Em,  Cm,  Rm,  Sm  > 
respectively. 


5.2  Formulation  of  the  Color-based  Model-driven  Selection 
Problem 

In  this  section  we  will  formulate  the  color-based  model-driven  selection  prob¬ 
lem  as  a  type  of  subgraph  matching  problem.  Given  the  image  region  adjacency 
graph,  the  model  object  if  present  in  the  scene  represented  in  the  image  will 
form  a  subgraph  in  Iq.  The  location  strategy  can  be  regarded  as  the  problem 
of  searching  for  suitable  subgraphs  that  satisfy  the  model  description.  Any  such 
subgraph  Ig  =<  Vg,Eg,Cg,  Rg,Sg  >  such  that  ||v;,||  <  ||V;„||,  ||£^,||  <  ||f^m||,  has 
associated  with  it  a  node  correspondence  vector  T  =  {(u„,Uj)|Vu„,  6  Vm,Ug  € 
V^U{±},{±}  is  a  null  match}.  Although  there  are  an  exponential  number  of  such 
subgraphs,  not  all  of  them  correspond  to  model  RAG.  From  the  model  description 
a  set  of  unary  and  binary  constraints  could  be  derived  (as  is  described  later)  that 
make  only  some  subgraphs  feasible.  A  feasible  subgraph  is,  therefore,  a  subgraph 
that  has  all  its  nodes  satisfying  unary  and  binary  constraints.  For  model-driven 
selection,  since  it  is  desirable  to  have  at  most  one  image  subgraph  matching  the 
model  RAG,  we  can  select  from  among  these  subgraphs,  a  subgraph(s)  that  in 
some  sense  best  satisfies  the  model  description.  Here  we  formulate  color-based 
model-driven  selection  as  the  problem  of  choosing  a  feasible  subgraph(s),  Ig  that 
minimizes  the  following  measure: 


cnr\T»v/T\.  /,  li^ll  \  ,  2^V(u„o,)€£:„T(u„)=u„T(»«)=w, 

SCORE(/.)  -  (1  -  pyj)  + - ‘Pyj - • 

(13) 

where  Rmgi‘f‘^m,Vm,‘tig,Vg)  expresses  the  change  in  the  relative  size  when  adjacent 
model  regions  {um,Vm)  are  paired  to  corresponding  image  regions  (ugyVg)  and  is 
given  by  '  SCORE(/j)  emphasizes 

rewards  for  making  as  many  correspondences  as  possible  as  indicated  by  the  first 
term,  called  Match(/0),  and  penalties  for  a  mismatch  of  the  relative  size,  as  indi¬ 
cated  by  the  second  term,  called  Deviation(/f),  which  measures  the  mean  square 
deviation  of  the  relative  sizes.  Since  the  subgraphs  are  all  feasible,  the  deviation 
accounts  for  occlusions  and  pose  changes  in  a  more  refined  way  than  the  binary 
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constraints  alone.  Another  advantage  of  this  measure  is  that  it  can  be  incremen¬ 
tally  computed  from  individued  region  matches,  so  that  a  branch-and-bound  search 
formulation  can  be  used  to  reduce  considerably  the  search  involved  in  finding  the 
best  subgraph  (i.e.  the  one  with  the  lowest  score).  Finally,  the  above  formulation 
is  based  on  the  hypothesis  that  at  least  one  of  the  regions  in  the  isolated  sub¬ 
graph  corresponds  to  a  model  region.  It  is  also  designed  primarily  to  locate  single 
instances  of  the  model  object  in  the  image.  More  instances  can  be  found  after 
removing  the  regions  in  the  found  instance  from  the  image  RAG. 

5.3  A  Color-based  Model-driven  Selection  Mechanism 

A  color-based  model-driven  selection  mechanism  was  built  using  the  above 
formulation.  The  mechanism  essentially  uses  a  search  strategy  to  find  the  best 
subgraph.  The  result  of  selection  is  the  correspondence  vector  associated  with  the 
best  subgraph.  The  search  strategy  used  the  following  constraints  to  restrict  the 
search  among  feasible  subgraphs. 

1.  Unary  constraints:  The  color  and  absolute  region  size  information  provided  in 

the  model  description  were  used  to  develop  unary  constraints  on  these  features. 
The  color  Cg{ug)  of  an  image  region  Ug  is  said  to  match  the  color  on  a 

model  region  Um  if  these  colors  belong  to  the  same  category  or  compatible  cate¬ 
gories  (described  in  Section  2.4).  With  this  scheme,  brighter  colors  (of  a  given  hue) 
in  the  model  could  potentially  match  to  darker  colors  of  the  same  overall  hue  in 
the  image,  thus  accounting  for  simple  lowering  in  illumination  levels.  The  bounds 
on  the  absolute  size  provided  by  Bm  act  as  loose  size  constraints  to  rule  out  some 
clearly  absurd  scale  changes  (such  as,  say,  a  100  fold  increase  in  the  smallest  model 
region  implying  a  blowup  of  the  model  outside  the  image  bounds). 

2.  Binary  constraints:  The  adjacency  (as  well  as  non- adjacency)  and  relative  size 
information  provided  in  the  model  were  used  as  binary  constraints  to  prune  some 
impossible  subgraphs.  Specifically,  the  lack  of  adjacency  in  model  regions  is  a 
powerful  constraint,  because  two  adjacent  regions  in  the  image  cannot  correspond 
to  two  regions  that  are  not  adjacent  in  the  given  color  description  (assuming  a  rigid 
model)^.  Two  adjacent  regions  in  the  model  may,  however,  not  appear  adjacent  in  a 
given  image  due  to  occlusion.  A  simple  analysis  of  occlusions  could  rule  out  several 
false  matches  in  such  cases  (such  as,  say,  discarding  a  match  if  the  area  spanned 
by  the  occlusion  within  a  rectangle  enclosing  the  candidate  non-adjacent  image 
regions  far  exceeds  the  combined  size  of  the  corresponding  adjacent  model  regions). 
The  bound  on  the  relative  sizes  served  as  another  binary  constraint.  The  bound 
Brm  was  used  to  constrain  possible  matches  by  requiring  it-ij(umj  VmjttjjVj)  ^ 

3.  Searching  for  the  best  subgraph 

The  search  for  the  best  subgraph  (i.e.  the  subgraph  that  minimize  the  value 
^Notice  here  that  the  search  is  for  a  given  color  view  of  the  model. 
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of  SCORE),  can  in  principle,  be  done  by  an  exhaustive  enumeration  of  subgraphs. 
But  with  the  algorithm  described  below,  the  search  required  is  reduced  to  a  large 
extent.  The  algorithm  used  is  essentially  a  variation  of  the  branch  and  bound 
interpretation  tree  (IT)  search  [12],  with  the  major  difference  being  that  no  verifi¬ 
cation  is  done  when  the  search  reaches  a  leaf  node  (as  the  task  is  selection  and  not 
recognition).  Each  level  of  the  search  tree  represents  a  possible  match  for  a  model 
region  (this  includes  a  null  match),  so  that  the  depth  of  the  search  tree  is  fixed 
by  the  number  of  nodes  in  the  model  RAG.  The  unary  constraints  are  checked  a 
priori  to  prune  the  breadth  of  the  search  tree.  A  subgraph  in  the  image  RAG  that 
is  a  potential  match  for  the  model  RAG  is  represented  by  a  path  in  the  IT.  The 
value  of  SCORE  is  updated  at  each  node  as  SCOREj+i  =  SCOREj  —  -I- 

By  keeping  the  lowest  value  of  SCORE  so  far,  search  can  be  cut  off  below  any  noae 
with  a  Deviation(/j)  value  greater  than  the  lowest  SCORE  value.  In  practice,  the 
unary  and  binary  constraints  prune  the  search  tree  considerably  so  that  the  aver¬ 
age  number  of  full  paths  (up  to  the  leaves)  explored  are  few  (s;  50).  Finally,  after 
an  instance  of  the  model  region  has  been  found  in  the  image,  the  selected  area 
is  removed  and  the  search  repeated  on  the  resulting  image  RAG  to  look  for  more 
instances  of  the  model  object. 

5.4  Results 

The  result  of  using  color-based  model-driven  selection  are  illustrated  in  Figures 
2  and  3.  Figure  2a  shows  a  model  object,  and  its  color  description  obtained  by 
using  the  color-region  segmentation  algorithm  of  Section  3  is  shown  in  Figure 
2b.  Here  the  background  was  removed  by  a  simple  threshold  on  intensities.  This 
description  is  used  to  create  a  model  RAG.  Figure  2c  shows  a  scene  in  which  the 
model  object  occurs.  The  scene  shown  has  several  other  objects  with  one  or  more 
of  the  model  colors.  Also,  the  model  appears  in  a  different  pose  here,  being  rotated 
to  the  left  about  the  vertical  axis.  Figure  3b  shows  the  result  of  applying  the  unary 
color  constraints.  The  big  blue  glass  matches  the  small  blue  flowers  based  on  color 
alone.  Next,  the  unary  construnt  on  absurd  size  changes  are  used  to  prune  the 
possibilities  and  the  result  is  shown  in  Figure  3c.  Finally,  the  subgraph  with  the 
lowest  value  of  SCORE  is  shown  in  Figure  3d.  As  can  be  seen  from  this  figure, 
a  region  containing  most  of  the  model  object  has  been  identified  even  though  the 
color  image  segmentation  was  not  perfect  (notice  the  small  streak  above  the  white 
rim  of  the  cup  that  merges  with  the  book  in  the  background). 

5.5  Search  Reduction  using  Color-based  Model-driven  Se¬ 
lection 

The  color-based  model-driven  selection  mechanism  provides  a  correspondence 
of  model  regions  to  some  image  regions.  The  matching  of  model  features  to  image 
features  can  be  restricted  to  within  corresponding  regions,  and  this  reduces  the 
number  of  matches  that  need  to  be  tried  for  recognition.  To  reduce  the  search 
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further,  conventional  grouping  can  be  performed  vnthin  the  selected  color  regions, 
as  described  in  Section  4.2.  To  estimate  the  search  reduction  in  this  case,  we 
continue  with  the  analysis  done  in  that  section.  Let  Ni  be  the  number  of  solution 
subgraphs  given  by  the  selection  mechanism,  and  let  Ik  represent  one  such  subgraph 
with  the  number  of  nodes  =  Nk.  Let  =  the  number  of  groups  in  region 

Uj  of  the  solution  subgraph  /fc,  and  region  Uj  of  the  model  RAG  that  corresponds  to 
Uj  as  implied  by  the  correspondence  vector  T  associated  with  Ik-  Then  assuming, 
as  before,  the  average  size  of  the  group  =  the  number  of  matches  that  need  to  be 
tried  are  YJj=\  Gu,G„^.g^.g^).  To  compare  this  kind  of  selection  with  pure 

grouping  we  can  take  some  typical  values  of  these  numbers.  Letting  M  =  200,  N 
=  3000,  <?  =  7,  G\f  =  30,  Gff  =  430,  G^^  =  8,  =  5,  Ni  =  5,  Nk  =  5,  we  have 

the  number  of  matches  with  grouping  alone  to  be  0{GMGNg^g^)  ~  1.56*  10®,  and 
using  model-driven  color-based  selection  with  grouping,  the  number  of  matches 
become  ss  1.25  *  10*.  Assuming  1  microsecond  as  time  per  match  this  corresponds 
to  reduction  in  match  time  from  26  minutes  to  as  2  minutes.  By  trying  several 
models  and  images  of  scenes  where  they  occured,  we  recorded  the  average  number 
of  subgraphs  generated  by  the  model-driven  selection  mechanism.  The  search 
estimates  were  obtained  using  the  above  formula  for  model-driven  selection  with 
grouping,  and  the  formulas  for  other  methods  mentioned  in  Section  4.2.  The 
results  are  shown  in  Table  II.  The  bound  on  the  number  of  groups  in  a  region  was 
the  same  as  used  in  Section  4.2.  As  can  be  seen  from  the  table,  the  number  of 
matches  using  correspondence  between  model  and  image  color  regions  is  always 
lower.  A  curious  feature  to  note  from  the  table  is  that  it  takes  less  number  of 
matches  (and  hence  lesser  time)  for  a  more  complex  model  (entry  1  in  Table  II) 
containing  several  color  regions,  than  for  a  simple  object  with  fewer  regions  (entry 
2  in  Table  II).  This  is  understandable  since,  with  a  large  number  of  regions,  the 
constraints  are  stronger  and  hence  the  false  matches  are  fewer. 

Discussion:  The  above  studies  estimated  the  search  reduction  without  actually 
integrating  the  selection  mechanism  with  a  recognition  system.  Moreover,  the 
estimated  search  was  based  on  the  assumption  that  there  were  no  false  negatives 
given  by  the  selection  mechanism.  This  can  happen  since  a  subgraph  with  the  low¬ 
est  value  of  SCORE  may  not  always  indicate  a  match  to  the  model.  To  estimate 
the  number  of  false  positives,  the  number  of  false  negatives,  and  the  reduction  in 
search  that  results  due  to  this  color-based  selection  mechanism,  we  have  recently 
developed  a  3D  from  2D  recognition  system  and  are  currently  testing  it.  Prelimi¬ 
nary  results  on  using  the  selection  mechanism  as  a  front-end  for  recognition  have 
so  far  been  encouraging. 


6.  SUMMARY 

In  this  paper  we  have  shown  how  color  can  be  used  as  a  cue  to  perform  both 
data  and  model-driven  selection.  Unlike  other  approaches  to  color,  we  have  used 
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the  intended  task  to  constrain  the  kind  of  color  information  to  be  extracted  from 
images.  This  led  to  a  fast  color  image  segmentation  algorithm  based  on  perceptual 
categorization  of  colors  to  given  perceptually  different  color  regions.  This  color 
description  of  the  image  formed  the  basis  of  data  and  model-driven  selection.  A 
saliency  measure  was  then  developed  to  rank  the  color  regions  to  perform  data- 
driven  selection.  Lastly,  an  approach  to  model-driven  selection  was  presented  that 
exploited  description  of  model  color  regions  to  locate  instances  of  model  in  the 
image.  Finally,  we  regard  color  as  one  of  the  many  cues  that  can  be  used  for 
selection.  Future  research  is  directed  towards  using  other  cues  such  as  texture  to 
perform  data  and  model-driven  selection. 

APPENDIX  A 

In  this  appendix  we  describe  the  psychophysical  experiments  done  to  derive  the 
color  categories.  The  aim  of  these  experiments  was  to  record  the  perceptual  judge¬ 
ments  of  colors  in  different  regions  of  the  color  space  by  a  systematic  exploration 
of  the  color  space.  For  this,  the  hue-saturation- value  representation  of  color  space 
was  used.  As  shown  in  Figure  7,  the  entire  spectrum  of  computer  recordable  colors 
(2^^  colors)  was  quantized  into  7200  bins  corresponding  to  a  5  degree  resolution  in 
hue,  and  10  levels  of  quantization  of  saturation  and  intensity  values.  In  order  to 
scan  the  color  space  systematically,  the  colors  in  bins  were  observed  starting  with 
the  bins  of  red  hue  and  going  around  the  color  space  back  to  the  red  hue  again.  The 
display  set  up  involved  a  24-bit  high  resolution  monitor  with  appropriate  monitor 
calibration  to  observe  the  colors  in  dark  room  conditions  with  a  minimum  viewing 
distance  of  2  feet.  Uniform  color  samples  (mondrians)  of  size  64  x  64,  correspond¬ 
ing  to  the  hue-saturation-and  brightness  value  in  each  bin  were  displayed  on  the 
screen.  The  set  of  mondrians  displayed  on  the  screen  varied  in  purity  vertically, 
2md  intensity  horizontally,  while  the  hue  was  kept  constant.  For  each  hue  the  col¬ 
ors  initially  displayed  had  a  resolution  of  0.2  in  brightness  and  saturation.  Four 
subjects  were  tested  individually  and  were  supplied  with  a  chart  that  showed  the 
gradations  in  brightness  and  purity  varying  in  a  manner  that  corresponded  to  the 
color  spectrum  shown  on  the  display.  Each  subject  was  then  asked  to  group  the 
color  samples  displayed  on  the  screen  into  perceptually  uniform  color  groups  and 
mark  the  result  on  the  chart  provided,  so  that  the  end  result  was  a  segmentation 
of  the  chart  into  perceptually  uniform  colored  groups.  The  presence  of  a  boundary 
was  taken  to  mark  a  change  in  color  category.  To  precisely  locate  this  boundary, 
the  color  samples  around  the  boundary  were  redisplayed  with  a  finer  resolution  (of 
0.1)  in  brightness  and  saturation.  Before  assigning  a  new  category  label  each  group 
is  compared  with  groups  of  previous  hue  by  displaying  the  colors  in  the  previous 
group  along  with  a  given  group  and  asking  the  subject  to  judge  if  this  group  could 
be  merged  with  the  previous  hue  groups.  The  observation  of  successive  mondri¬ 
ans  was  done  with  a  10  minute  intervals  in  between  to  remove  after-effects  of  the 
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previous  display.  The  mondrians  displayed  were  sufficiently  apart  on  the  screen  to 
keep  the  effects  of  simultaneous  contrast  small.  By  averaging  out  the  differences 
in  the  responses  between  subjects,  we  found  about  220  different  color  categories 
were  sufficient  to  describe  the  color  space.  The  color  category  information  was 
then  summarized  in  a  color-look-up  table. 
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Figiire  1:  Dlustration  of  color  region  segmentation  and  color- sallency.  (a)  Input  image  depicting  a 
scene  of  objects  of  different  materials  and  having  occlusions  and  inter-reflections,  (b)  Segmented  image 
using  the  color  region  segmentation  algorithm,  (c)-(f)  The  four  most  distinctive  regions  detected 
using  the  color- saliency  measure.  The  white  portion  in  the  red  book  appears  so  because  of  the  white 
background. 

(a)  (b) 


(c) 

Figure  2;  Illustration  of  model-driven  selection  —  Model  and  scene,  (a)  The  object  serving  as  the 
model,  (b)  Its  color  description  produced  by  the  segmentation  algorithm  of  Section  3.  (c)  A  cluttered 
scene  in  which  the  object  appears. 


(<i)  (c) 


Figure  3:  Clustration  of  color-based  model-driven  selecticm.  (a)  A  scene  containing  the  model  object 
of  Figure  2a.  (b)  Regions  selected  based  on  unary  colw  constraint,  (c)  Regions  of  (b)  pruned  after 
using  the  unary  size  constraint,  (d)  Regions  corresponding  to  the  best  subgr^h  that  matched  the 
model  specifications. 
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Figure  4:  Dhistration  of  color  region  segmentation  and  color-saliency.  (a)  Input  image  consisting  of 
regions  of  3  different  colors:  red,  green  and  blue  agunst  an  almost  white  background,  (b)  Result  of 
step2  of  algorithm  with  regions  colored  differently  from  the  ordinal  image,  (c)  Final  segmentation 
of  the  image  of  Fig.3a.  (d)  —  (f)  The  three  most  distinctive  regions  found  using  the  color  saliency 
measure. 
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Figure  5:  Blustration  of  color  repon  segaientation  and  color-saliency  —  Another  exan^>le.  (a)  input 
image  of  a  set  of  colored  cloth  materials,  (b)  Regions  obtained  at  the  end  of  step-2  of  algorithm  (befor^ 
merging  overlapping  regions),  (c)  Final  segmented  image  suiUbly  recolored  to  show  the  segmented 
regions,  (d)  -  (f)  The  three  most  distinctive  regions  found  using  the  color  saliency  measure. 


Figure  6:  Illustration  of  color  region  segmentation  and  color-saliency  —  Last  example,  (a)  Input 
image  depicting  a  scene  of  different  kinds  of  objects  (cloths  and  p<dished  book),  (b)  The  color  region- 
extracted  from  (a)  using  the  color  region  segmentation  algorithm,  (c)-(f)  The  four  most  distmcn.- 
regions  detected  using  the  color-saliency  measure. 
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Figure  7:  lUastration  of  the  qaantixation  of  the  hsv-color-space.  (a)  hsv-color  model,  (b)  a  cell  of 
the  quantized  color  space,  (c)  The  quantisation  data  and  the  number  of  categories  obtained. 


Figure  8:  Graphs  of  weighting  functions  used  in  devising  the  color-saliency  measure. 
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grouping.  Here  g  =  7,  Time  per  match  =  1  microsecond,  and  the  grouping 
method  is  as  described  in  text. 


